Cluster-level tuning of a shallow water equation solver on the Intel MIC architecture

نویسندگان

  • Andrey Vladimirov
  • Cliff Addison
چکیده

The paper demonstrates the optimization of the execution environment of a hybrid OpenMP+MPI computational fluid dynamics code (shallow water equation solver) on a cluster enabled with Intel Xeon Phi coprocessors. The discussion includes: 1. Controlling the number and affinity of OpenMP threads to optimize access to memory bandwidth; 2. Tuning the inter-operation of OpenMP and MPI to partition the problem for better data locality; 3. Ordering the MPI ranks in a way that directs some of the traffic into faster communication channels; 4. Using efficient peer-to-peer communication between Xeon Phi coprocessors based on the InfiniBand fabric. With tuning, the application has 90% percent efficiency of parallel scaling up to 8 Intel Xeon Phi coprocessors in 2 compute nodes. For larger problems, scalability is even better, because of the greater computation to communication ratio. However, problems of that size do not fit in the memory of one coprocessor. The performance of the solver on one Intel Xeon Phi coprocessor 7120P exceeds the performance on a dual-socket Intel Xeon E5-2697 v2 CPU by a factor of 1.6x. In a 2-node cluster with 4 coprocessors per compute node, the MIC architecture yields 5.8x more performance than the CPUs. Only one line of legacy Fortran code had to be changed in order to achieve the reported performance on the MIC architecture (not counting changes to the command-line interface). The methodology discussed in this paper is directly applicable to other bandwidth-bound stencil algorithms utilizing a hybrid OpenMP+MPI approach. Table of

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Variational Principle for the Generalized KdV-Burgers Equation with Fractal Derivatives for Shallow Water Waves

The unsmooth boundary will greatly affect motion morphology of a shallow water wave, and a fractal space is introduced to establish a generalized KdV-Burgers equation with fractal derivatives. The semi-inverse method is used to establish a fractal variational formulation of the problem, which provides conservation laws in an energy form in the fractal space and possible solution structures of t...

متن کامل

Study of parallel programming models on computer clusters with Intel MIC coprocessors

Coprocessors based on the Intel Many Integrated Core (MIC) Architecture have been adopted in many highperformance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the...

متن کامل

Multi-Kepler GPU vs. multi-Intel MIC for spin systems simulations

We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of spin systems. As a benchmark we consider the time required to update a single spin of the 3D Heisenberg spin glass model by using the Over-relaxation algorithm. We present data also for a traditional high-end multi...

متن کامل

Visual Exploration of Data with Multithread MIC Computer Architectures

Knowledge mining from immense datasets requires fast, reliable and affordable tools for their visual and interactive exploration. Multidimensional scaling (MDS) is a good candidate for embedding of high-dimensional data into visually perceived 2-D and 3-D spaces. We focus here on the way to increase the computational performance of MDS in the context of interactive, hierarchical, visualization ...

متن کامل

First experiences with the Intel MIC architecture at LRZ

With the rapidly growing demand for computing power new accelerator based architectures have entered the world of high performance computing since around 5 years. In particular GPGPUs have recently become very popular, however programming GPGPUs using programming languages like CUDA or OpenCL is cumbersome and errorprone. Trying to overcome these difficulties, Intel developed their own Many Int...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1408.1727  شماره 

صفحات  -

تاریخ انتشار 2014